<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: the s6-supervise program</title>
    <meta name="Description" content="s6: the s6-supervise program" />
    <meta name="Keywords" content="s6 command s6-supervise servicedir supervision supervise" />
    <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="//skarnet.org/software/">Software</a><br />
<a href="//skarnet.org/">skarnet.org</a>
</p>

<h1> The s6-supervise program </h1>

<p>
s6-supervise monitors a long-lived process (or <em>service</em>), making sure it
stays alive, sending notifications to registered processes when it dies, and
providing an interface to control its state. s6-supervise is designed to be the
last non-leaf branch of a <em>supervision tree</em>, the supervised process
being a leaf.
</p>

<h2> Interface </h2>

<pre>
     s6-supervise <em>servicedir</em>
</pre>

<p>
 s6-supervise's behaviour is approximately the following:
</p>

<ul>
 <li> s6-supervise changes its current directory to <em>servicedir</em>. </li>
 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
 <li> It forks and executes the <tt>./run</tt> file in the service directory.
 <li> <tt>./run</tt> should be a long-lived process: it can chain load (i.e. exec into
other binaries), but should not die. It's the daemon that s6-supervise monitors
and manages. </li>
 <li> When <tt>./run</tt> dies, s6-supervise spawns <tt>./finish</tt>, if it exists.
This script should be short-lived: it's meant to clean up application state, if
necessary, that has not been cleaned up by <tt>./run</tt> itself before dying. </li>
 <li> When <tt>./finish</tt> dies, s6-supervise spawns <tt>./run</tt> again. </li>
 <li> s6-supervise operation can be controlled by the <a href="s6-svc.html">s6-svc</a>
program. It can be sent commands like "restart the service", "bring the service down", etc. </li>
 <li> s6-supervise normally runs forever. If told to exit by <a href="s6-svc.html">s6-svc</a>,
it waits for the service to go down one last time, then exits 0. </li>
</ul>

<p>
 For a precise description of s6-supervise's behaviour, check the
<a href="#detailed">Detailed operation</a> section below, as well as
the <a href="servicedir.html">service directory</a> page:
s6-supervise operation can be extensively configured by the presence
of certain files in the service directory.
</p>

<h2> Options </h2>

<p>
 s6-supervise does not support options, because it is normally not run
manually via a command line; it is usually launched by its own
supervisor, <a href="s6-svscan.html">s6-svscan</a>. The way to
tune s6-supervise's behaviour is via files in the
<a href="servicedir.html">service directory</a>.
</p>

<h2> Readiness notification support </h2>

<p>
 If the <a href="servicedir.html">service directory</a> contains a valid
<tt>notification-fd</tt> file when the service is started, or restarted,
s6-supervise creates and listens to an additional pipe from the service
for <a href="notifywhenup.html">readiness notification</a>. When the
notification occurs, s6-supervise updates the <tt>./supervise/status</tt>
file accordingly, then sends
a <tt>'U'</tt> event to <tt>./event</tt>.
</p>

<p>
 If the service is logged, i.e. if the service directory has a
<tt>log</tt> subdirectory that is also a service directory, and the
s6-supervise process has been launched by
that is also <a href="s6-svscan.html">s6-svscan</a>, then by default
the service's stdout goes into the logging pipe. If you set
<tt>notification-fd</tt> to 1, the logging pipe will be overwritten
by the notification pipe, which is probably not what you want. Instead,
if your daemon writes a notification message to its stdout, you should
set <tt>notification-fd</tt> to (for instance) 3, and redirect outputs
in your run script. For instance, to redirect stderr to the logger and
stdout to a <tt>notification-fd</tt> set to 3, you would start your
daemon as <tt>fdmove -c 2 1 fdmove 1 3 prog...</tt> (in execline), or
<tt>exec 2&gt;&amp;1 1&gt;&amp;3 3&lt;&amp;- prog...</tt> (in shell).
</p>

<h2> Signals </h2>

<p>
 s6-supervise reacts to the following signals:
</p>

<ul>
 <li> SIGTERM: bring down the service and exit, as if a
<a href="s6-svc.html">s6-svc -xd</a> command had been received </li>
 <li> SIGHUP: close its own stdin and stdout, and exit as soon as the
service stops, as if an <a href="s6-svc.html">s6-svc -x</a> command
had been received </li>
 <li> SIGQUIT: exit immediately without touching the service in any
way. </li>
 <li> SIGINT: send a SIGINT to the process group of the service, then
exit immediately. (The point here is to correctly forward SIGINT
in the case where s6-supervise is running in a terminal and the user
sent ^C to interrupt it.) </li>
</ul>

<a name="#detailed">
<h2> Detailed operation </h2>
</a>

<ul>
 <li> s6-supervise switches to the <em>servicedir</em>
<a href="servicedir.html">service directory</a>. </li>
 <li> It creates a <tt>supervise/</tt> subdirectory (if it doesn't exist yet) to
store its internal data. </li>
 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
 <li> If the <tt>./event</tt> <a href="fifodir.html">fifodir</a> does not exist,
s6-supervise creates it and allows subscriptions to it from processes having the same
effective group id as the s6-supervise process.
If it already exists, it uses it as is, without modifying the subscription rights. </li>
 <li> It <a href="libs6/ftrigw.html">sends</a> a <tt>'s'</tt> event to <tt>./event</tt>. </li>
 <li> If the default service state is up (i.e. there is no <tt>./down</tt> file),
s6-supervise spawns <tt>./run</tt>. One argument is given to the <tt>./run</tt>
program: <em>servicedir</em>, the name of the directory s6-supervise is being
run on. It is given exactly as given to s6-supervise, without recanonicalization.
In particular, if s6-supervise is being managed by <a href="s6-svscan.html">s6-svscan</a>,
<em>servicedir</em> is always of the form <tt><em>foo</em></tt> or <tt><em>foo</em>/log</tt>,
and <em>foo</em> contains no slashes. </li>
 <li> s6-supervise sends a <tt>'u'</tt> event to <tt>./event</tt> whenever it
successfully spawns <tt>./run</tt>. </li>
 <li> If there is a <tt>./notification-fd</tt> file in the service directory and,
at some point after the service has been spawned, s6-supervise is told that the
service is ready, it sends a <tt>'U'</tt> event to <tt>./event</tt>. There are
several ways to tell s6-supervise that the service is ready:
  <ul>
   <li> the daemon may <a href="notifywhenup.html">do so itself</a>. </li>
   <li> the run script may have forked a
<a href="s6-notifyoncheck.html">s6-notifyoncheck</a> process that polls the
service for readiness. </li>
  </ul> </li>
 <li> When <tt>./run</tt> dies, s6-supervise sends a <tt>'d'</tt> event to <tt>./event</tt>.
It then spawns <tt>./finish</tt> if it exists.
<tt>./finish</tt> will have <tt>./run</tt>'s exit code as first argument, or 256 if
<tt>./run</tt> was signaled; it will have the number of the signal that killed <tt>./run</tt>
as second argument, or an undefined number if <tt>./run</tt> was not signaled;
and it will have <em>servicedir</em> as third argument. </li>
 <li> By default, <tt>./finish</tt> must exit in less than 5 seconds. If it takes more than that,
s6-supervise kills it with a SIGKILL. This can be configured via the
<tt>./timeout-finish</tt> file, see the description in the
<a href="servicedir.html">service directory page</a>. </li>
 <li> When <tt>./finish</tt> dies (or is killed),
s6-supervise sends a <tt>'D'</tt> event to <tt>./event</tt>. Then
it restarts <tt>./run</tt> unless it has been told not to. </li>
 <li> If <tt>./finish</tt> exits 125, then s6-supervise sends a <tt>'O'</tt> event
to <tt>./event</tt> <em>before</em> the <tt>'D'</tt>, and it
<strong>does not restart the service</strong>, as if <tt>s6-svc -O</tt> had
been called. This can be used to signify permanent failure to start the service. </li>
 <li> There is a minimum 1-second delay between two <tt>./run</tt> spawns, to avoid busylooping
if <tt>./run</tt> exits too quickly. If the service has been <em>ready</em> for more
than one second, it will restart immediately, but if it is not <em>ready</em> when
it dies, s6-supervise will always pause for 1 second before spawning it again. </li>
 <li> When killed or asked to exit, it waits for the service to go down one last time, then
sends a <tt>'x'</tt> event to <tt>./event</tt> before exiting 0. </li>
</ul>

<p>
 Make sure to also check the <a href="servicedir.html">service directory</a>
documentation page, for the full list of files that can be present in a service
directory and impact s6-supervise's behaviour in any way.
</p>

<h2> Usage notes </h2>

<ul>
 <li> s6-supervise is a long-lived process. It normally runs forever, from the system's
boot scripts, until shutdown time; it should not be killed or told to exit. If you have
no use for a service, just turn it off; the s6-supervise process does not hurt. </li>
 <li> Even in boot scripts, s6-supervise should normally not be run directly. It's
better to have a collection of <a href="servicedir.html">service directories</a> in a
single <a href="scandir.html">scan directory</a>, and just run
<a href="s6-svscan.html">s6-svscan</a> on that scan directory. s6-svscan will spawn
the necessary s6-supervise processes, and will also take care of logged services. </li>
 <li> s6-supervise always spawns its child in a new session, as a session leader.
The goal is to protect the supervision tree from misbehaved services that would
send signals to their whole process group. Nevertheless, s6-supervise's handling of
SIGINT ensures that its service is killed if you happen to run it in a terminal and
send it a ^C. </li>
 <li> You can use <a href="s6-svc.html">s6-svc</a> to send commands to the s6-supervise
process; mostly to change the service state and send signals to the monitored
process. </li>
 <li> You can use <a href="s6-svok.html">s6-svok</a> to check whether s6-supervise
is successfully running. </li>
 <li> You can use <a href="s6-svstat.html">s6-svstat</a> to check the status of a
service. </li>
 <li> s6-supervise maintains internal information inside the <tt>./supervise</tt>
subdirectory of <em>servicedir</em>. <em>servicedir</em> itself can be read-only,
but both <em>servicedir</em><tt>/supervise</tt> and <em>servicedir</em><tt>/event</tt>
need to be read-write. </li>
 <li> If <em>servicedir</em> isn't writable by s6-supervise, for any reason, then the
<a href="s6-svc.html">s6-svc</a> <tt>-D</tt> and <tt>-U</tt> commands will not work
properly since s6-supervise will be unable to create or delete a
<em>servicedir</em><tt>/down</tt> file; in this case s6-supervise will print a warning
on stderr, and perform the equivalent of <tt>-d</tt> or <tt>-u</tt> instead &mdash; it
will just be unable to change the permanent service configuration. </li>
</ul>

<h2> Implementation notes </h2>

<ul>
 <li> s6-supervise tries its best to stay alive and running despite possible
system call failures. It will write to its standard error everytime it encounters a
problem. However, unlike <a href="s6-svscan.html">s6-svscan</a>, it will not go out
of its way to stay alive; if it encounters an unsolvable situation, it will just
die. </li>
 <li> Unlike other "supervise" implementations, s6-supervise is a fully asynchronous
state machine. That means that it can read and process commands at any time, even
when the machine is in trouble (full process table, for instance). </li>
 <li> s6-supervise <em>does not use malloc()</em>. That means it will <em>never leak
memory</em>. <small>However, s6-supervise uses opendir(), and most opendir()
implementations internally use heap memory - so unfortunately, it's impossible to
guarantee that s6-supervise does not use heap memory at all.</small> </li>
 <li> s6-supervise has been carefully designed so every instance maintains as little
data as possible, so it uses a very small
amount of non-sharable memory. It is not a problem to have several
dozens of s6-supervise processes, even on constrained systems: resource consumption
will be negligible. </li>
</ul>

</body>
</html>
